Can Data Be Heretical?


And what would heretical data even look like?

What makes this heretical data (or not)?

August Hahn, TOC from
Thilo, Codex Apocryphus, 1832

August Hahn, Marcion’s Gospel,
first modern Greek edition

What makes this heretical data?

van Manen, TOC,
in Theologisch Tijdschrift, 1887

van Manen, Marcion’s Apostle,
first modern Greek edition
nb: Galatians only

What makes this heretical data?

Theodor Zahn, Marcion’s Gospel (1892)

Theodor Zahn, Marcion’s Apostle (1892)

What makes this heretical data?

Adolf von Harnack, Marcion’s Gospel (1924)

Adolf von Harnack, Marcion’s Apostle (1924)

What makes this heretical data?

Ulrich Schmid, Marcion’s Apostle (1995)

Dieter Roth, Marcion’s Gospel (2015)

Raison d’écrire


As the saying goes, “Children are meant to be seen, not heard.”

So also here, “Marcion’s texts are meant to be seen, not heard,
neither privately nor liturgically.”

Among editions of ancient Christian texts, Marcion’s scriptures
seem uniquely and incapable d’habitude of being read aloud,
intrinsically interdit from reconstruction and performances as scripts.

What makes texts into heretical data is to render them
–in every way that matters–as unperformable.

*Disclaimer: this is not a plug for the “Marcionite Church”,
whose textual work is entirely lacking in rigor and merit,
even though Wikipedians seem to think otherwise.

Hermeneutics of Idiosyncratic Subjectivity


Editions that have often been held up as definitive or authortative,

–Brill’s fairly recent critical edition in particular–

take to an almost absurd extreme the nuancing of indications
to represent the reconstructor’s intricate feelings of doubt,
building convoluted bespoke systems of font styling
in an elaborate effort to represent those feelings.

That is how to make and keep data heretical!

Hermeneutics of Representational Obfuscation


These editions often perform strange things in the main running text:

  • embedding variants within the text;
  • switching languages in the presentation of the text;
  • filling the text with labels such as “unattested” and “not present”;
  • placing descriptive/comparative commentary in the text;
  • obsessively annotating custom indications of editorial doubt;
  • and sometimes even skipping over well-attested wording because the attestations are too numerous and/or complex (!)


That is how to make and keep data heretical!

Obfuscation in Action: Roth on Ev 6.43

Dieter Roth, The Text of Marcion’s Gospel, p. 415

Elsewhere, Roth quotes 8 attestations of this verse by 6 different authors:

  • Tertullian, Marc. 4.17.12 and five comparanda
  • Origen, princ. 2.5.4
  • Hippolytus, Haer. 10.19.3
  • Ps-Tertullian, Haer. 6.2
  • Philastrius, Diver. heres. 45.2
  • Ps-Adamantius, Dial. 56.14-16, 58.11-13, 58.13-16

Yet Roth only ends up restoring 5 words, compared to:

  • Older: Hahn (14), Zahn (15), Harnack (13)
  • Newer: BeDuhn (14), Klinghardt (13), Nicolotti (13)

Hermeneutics of Patristic Cartesianism



These same editions also proceed on the basis of the premise that:

∵ the Church Fathers are our main source and starting point for the reconstruction of Marcion’s scriptures in general;

∴ we can only have verifiable knowledge about a specific chapter, verse, phrase, or word if the Church Fathers attested to it.


I tend to call this mindset “Patristic Cartesianism”.
One might better classify it as Naïve Patristic Verifiability.

That is how to make and keep data heretical!

Hermeneutics of Probabilistic Pattern Verifiability



Let’s contrast Naïve Patristic Verifiability with Probabilistic Pattern Verifiability.

  • Scientific knowledge, and modern data science in particular, is about observable data and verifiable data patterns.
  • These patterns are everywhere and in everything, if one takes the time, effort, and attention to observe them carefully.
  • Uncovering them, and using them to make probabilistic decisions at the macro and micro level, is essential to the practice of data science.

To be Useful & Usable,
Data Must be Normalized*





*even heretical data

Normalizing the Data


Data normalization means standardizing all the data in every way:

  • character encodings (Unicode UTF-8)
  • file types (CSV, TSV, XML)
  • field types (location, text, attestations, notes, etc.)
  • schema types (TEI-XML; e.g., Perseus, PTA) & validation
  • identifier types (cts_urn; DOIs)
  • version control (Git; Zenodo)
  • distribution architecture
    • CapiTainS: Citable APIs Protocols Interoperability Text Standards
    • Github hooktest & release Action
  • file copyright and licensing (Creative Commons)
  • binary editorial rendering decisions
  • structured apparatus, separate from main text & attestations

Normalized Decisions & Confidence Levels


For attested wording, are there variants? If so, that’s cool!
But–to paraphrase Winston Wolf–

Pretty please, with sugar on top, make a decision, render the wording, and cleanly encode the variants.

If you have complex feelings of doubt about verses or words, again, cool.
But–to paraphrase Chris Farley–

Please, for the love of God, stop making ESPN rankings boards out of your critical editions.

A ten-fold nuanced hierarchy of subjectivity and indecision
serves to stunt scientific progress, not make it.

By comparison, the PTA’s meticulous TEI-XML schema allows for two levels of editorial confidence: two, as in <= 2. They are, to wit: “high” and “low”.

What Can We Do
with Normalized Marcionite Data?

Archive It for the Public & Scholarly Record


Upload it (closed, open, embargoed–you decide) in an open science repository such as Zenodo, Dataverse, Figshare, HAL, OSF, arXiv, Humanities Commons CORE, or even a University repository.

Most of these provide free DOI minting and MD5 hashes to make datasets identifiable as discrete datasets, ensure file and data integrity, and allow for automated and persistent global link resolution.

Also realize that, however useful Academia.edu and ResearchGate.net might be, they are fundamentally self-managed social media and academic profile platforms, not Open Science repositories.

Marcion’s Evangelion in Harvard Dataverse


Bilby (2021),
von Harnack’s Evangelion (1924)
in Harvard Dataverse
@ doi: 10.7910/DVN/5TEA5A

Bilby & BeDuhn (2023),
BeDuhn’s Greek Evangelion (2023)
in Harvard Dataverse
@ doi: 10.7910/DVN/UQVGW6

Marcion’s Apostolos in Harvard Dataverse


Bilby, Bull, & Lotharp (2023),
Zahn’s, van Manen’s, & Harnack’s Apostolos
in Harvard Dataverse
@ doi: 10.7910/DVN/ZUVKQW

Lotharp, Bilby, Bull, & Vinzent (2024),
Vinzent’s original Apostolos (2023)
in Harvard Dataverse
@ doi: 10.7910/DVN/2VKBVN

Submit it for Peer-Review


Bilby & BeDuhn, “BeDuhn’s
Greek Reconstruction of Marcion’s Gospel”,
Journal of Open Humanities Data (2023)
@ doi 10.5334/johd.126

Lotharp, Bull, & Bilby, “Normalized Datasets of
Zahn’s, van Manen’s, and Harnack’s
Reconstructions of Marcion’s Apostolos”,
Journal of Open Humanities Data (2023)
@ doi 10.5334/johd.122

Review It for a Data Journal


Because JOHD practices double- and often triple-blind peer-review,
I don’t know who all the reviewers of our team’s datasets have been.

But at least a couple of reviewers (both well-known NAPS members) have disclosed their identities to me.

Blind peer-review is important, but there is broader movement in the scientific and grant-funding communities for open/transparent peer-review.

Peer-review, of course, does not mean endorsement, but simply confirmation that scholarly standards of rigor and research integrity are met.

Cite It


Since the advent of the Linked Open Data movement,

datasets and code packages are citable academic resources!

Upload, Enrich, and Update It on Github


Download & Sync It


  • Harvard Dataverse archival download
    • Dataverse packages in R and Python
  • Github current repo clone & sync
  • Jupyter notebook demo (yesterday’s DH presentation)

Refine It with Your Friends/Team


morphological query log on
@ Marcion_Apostolos/grc6_change_logs

query hits pre- vs. post- on
@ Marcion_Apostolos/grc6_change_logs

Refine It with Your Friends/Team (cont.)

editorial proposals log on @ Marcion_Apostolos/grc6_change_logs

proposal change log on @ Marcion_Apostolos/grc6_change_logs

Release It to a DH Project


src_data repo hooktest & release on
@ nauarchus/src_data/actions

Scaife dev server sha update
@ /scaife-viewer/scaife-viewer

Read It in Your Favorite UI


Perseus Scaife dev server
Apostolos editions

Perseus Scaife dev server
Harnack’s Apostolos

Count Words and Letters by Edition



Count words and letters by edition

Count Verses and Word across Editions


Apostolos editions: verse & word counts

Match Tokens and Map Similarity Scores


word match heatmap

lemma match heatmap

Model Unsupervised Topics


Apostolos Paul vs. Canonical Paul
Topic Modeling gamma chart

And So Much More



to help create a future where science and religion harmonize

Call for Research Collaborations



Ways researchers, faculty, and classes/students can help!

  • XML encodings of attestations by Epiphanius, Tertullian, etc.
  • XML editions of the Nag Hammadi Codices and other 2nd-4th century texts
  • manuscript transcriptions of Luke and the canonical Letters of Paul
  • adding new annotation layers
  • building knowledge graphs
  • connecting with/to LOD ontologies
  • collaborating on digital critical editions
  • statistical/quantitative analysis
  • convening/funding international projects akin to Q & IGNTP

If you would like to help, please reach out to Markus Vinzent and/or Mark Bilby
with your CV and a letter of interest!